Addressing the Resource Bottleneck to Create Large-Scale Annotated Texts

نویسندگان

  • Jon Chamberlain
  • Massimo Poesio
  • Udo Kruschwitz
چکیده

Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-ofspeech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpora of the size needed for modern computational linguistics research cannot however be created by small groups of hand annotators. The ANAWIKI project strikes a balance between collecting high-quality annotations from experts and applying a game-like approach to collecting linguistic annotation from the general Web population. More generally, ANAWIKI is a project that explores to what extend expert annotations can be substituted by a critical mass of non-expert judgements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus

In this article we illustrate and evaluate an approach to create high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The trans...

متن کامل

Solving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm

In this paper, we study the re-entrant no-wait flexible flowshop scheduling problem with makespan minimization objective and then consider two parallel machines for each stage. The main characteristic of a re-entrant environment is that at least one job is likely to visit certain stages more than once during the process. The no-wait property describes a situation in which every job has its own ...

متن کامل

Play your way to an annotated corpus: Games with a purpose and anaphoric annotation

The lack of large-scale corpora annotated with semantic information has been a serious bottleneck for computational semantics, slowing down not only the development of more advanced statistical methods, but also our empirical understanding of the phenomena. The creation of the Ontonotes corpus will finally bring computational semantics to the point where computational syntax was in 1993 but in ...

متن کامل

An Efficient Approach for Bottleneck Resource(s) Detection Problem in the Multi-objective Dynamic Job Shop Environments

Nowadays energy saving is one of the crucial aspects in decisions. One of the approaches in this case is efficient use of resources in the industrial systems. Studies in real manufacturing systems indicating that one or more machines may also act as the Bottleneck Resource/ Resources (BR). On the other hand according to the Theory of Constraints (TOC), the efficient use of resources in manufact...

متن کامل

Towards Universal Web Parsebanks

Recently, there has been great interest both in the development of cross-linguistically applicable annotation schemes and in the application of syntactic parsers at web scale to create parsebanks of online texts. The combination of these two trends to create massive, consistently annotated parsebanks in many languages holds enormous potential for the quantitative study of many linguistic phenom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008